protocols/identify: Fix race condition in discover_peer_after_disconnect#2744
protocols/identify: Fix race condition in discover_peer_after_disconnect#2744mxinden merged 2 commits intolibp2p:masterfrom
Conversation
**Summary** of the plot of the `discover_peer_after_disconnect` test: 1. `swarm2` connects to `swarm1`. 2. `swarm2` requests an identify response from `swarm1`. 3. `swarm1` sends the response to `swarm2`. 4. `swarm2` disconnects from `swarm1`. 5. `swarm2` tries to disconnect. **Problem** `libp2p-identify` sets `KeepAlive::No` when it identified the remote. Thus `swarm1` might identify` `swarm2` before `swarm2` identified `swarm1`. `swarm1` then sets `KeepAlive::No` and thus closes the connection to `swarm2` before `swarm2` identified `swarm1`. In such case the unit test `discover_peer_after_disconnect hangs indefinitely. **Solution** Add an initial delay to `swarm1` requesting an identification from `swarm2`, thus ensuring `swarm2` is always able to identify `swarm1`.
|
Thanks @mxinden!
fn connection_keep_alive(&self) -> KeepAlive {
if self.is_outbound_ongoing || self.is_inbound_ongoing {
KeepAlive::Yes
} else {
KeepAlive::No
}
}(
Edit: I just noticed that my above idea does not make sense, so please ignore. |
I am assuming the same, and I have not seen an indication for this being a problem in the wild. In case we ever do come across this, we can still introduce the |
Description
Summary of the plot of the
discover_peer_after_disconnecttest:swarm2connects toswarm1.swarm2requests an identify response fromswarm1.swarm1sends the response toswarm2.swarm2disconnects fromswarm1.swarm2tries to disconnect.Problem
libp2p-identifysetsKeepAlive::Nowhen it identified the remote. Thusswarm1mightidentify
swarm2beforeswarm2identifiedswarm1.swarm1then setsKeepAlive::Noand thus closes the connection toswarm2beforeswarm2identifiedswarm1. In such case the unit testdiscover_peer_after_disconnect hangs indefinitely.Solution
Add an initial delay to
swarm1requesting an identification fromswarm2, thus ensuringswarm2is always able to identify
swarm1.Links to any relevant issues
Fixes #2743
Open Questions
We could as well introduce
libp2p-pingand setPingConfig::with_keep_alive. I think the fix here is sufficient though.As another alternative, instead of setting
KeepAlive::Noonce we identified the remote, we could as well setKeepAlive::Until(Instant::now() + Duration::from_secs(10)), thus giving the remote a chance to identify us.Change checklist